Goto

Collaborating Authors

 Larnaca





Breaking the Gradient Barrier: Unveiling Large Language Models for Strategic Classification

Lv, Xinpeng, Mao, Yunxin, Li, Haoxuan, Liang, Ke, Yang, Jinxuan, Huang, Wanrong, Chi, Haoang, Chen, Huan, Lan, Long, Chen, Yuanlong, Yang, Wenjing, Wang, Haotian

arXiv.org Artificial Intelligence

Strategic classification~(SC) explores how individuals or entities modify their features strategically to achieve favorable classification outcomes. However, existing SC methods, which are largely based on linear models or shallow neural networks, face significant limitations in terms of scalability and capacity when applied to real-world datasets with significantly increasing scale, especially in financial services and the internet sector. In this paper, we investigate how to leverage large language models to design a more scalable and efficient SC framework, especially in the case of growing individuals engaged with decision-making processes. Specifically, we introduce GLIM, a gradient-free SC method grounded in in-context learning. During the feed-forward process of self-attention, GLIM implicitly simulates the typical bi-level optimization process of SC, including both the feature manipulation and decision rule optimization. Without fine-tuning the LLMs, our proposed GLIM enjoys the advantage of cost-effective adaptation in dynamic strategic environments. Theoretically, we prove GLIM can support pre-trained LLMs to adapt to a broad range of strategic manipulations. We validate our approach through experiments with a collection of pre-trained LLMs on real-world and synthetic datasets in financial and internet domains, demonstrating that our GLIM exhibits both robustness and efficiency, and offering an effective solution for large-scale SC tasks.


VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering

Tai, Zhenghan, Wu, Hanwei, Hu, Qingchen, Chi, Jijun, He, Hailin, Ding, Lei, Kwok, Tung Sum Thomas, Xiao, Bohuai, Hua, Yuchen, Wang, Suyuchen, Lu, Peng, Li, Muzhi, Wu, Yihong, Ma, Liheng, Huang, Jerry, Zhang, Jiayi, Zhang, Gonghao, Jiang, Chaolong, Tian, Jingrui, Lyu, Sicheng, Li, Zeyu, Han, Boyu, Mo, Fengran, Yu, Xinyue, Cui, Yufei, Zhou, Ling, Wang, Xinyu

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) is becoming increasingly essential for Question Answering (QA) in the financial sector, where accurate and contextually grounded insights from complex public disclosures are crucial. However, existing financial RAG systems face two significant challenges: (1) they struggle to process heterogeneous data formats, such as text, tables, and figures; and (2) they encounter difficulties in balancing general-domain applicability with company-specific adaptation. To overcome these challenges, we present VeritasFi, an innovative hybrid RAG framework that incorporates a multi-modal preprocessing pipeline alongside a cutting-edge two-stage training strategy for its re-ranking component. VeritasFi enhances financial QA through three key innovations: (1) A multi-modal preprocessing pipeline that seamlessly transforms heterogeneous data into a coherent, machine-readable format. (2) A tripartite hybrid retrieval engine that operates in parallel, combining deep multi-path retrieval over a semantically indexed document corpus, real-time data acquisition through tool utilization, and an expert-curated memory bank for high-frequency questions, ensuring comprehensive scope, accuracy, and efficiency. (3) A two-stage training strategy for the document re-ranker, which initially constructs a general, domain-specific model using anonymized data, followed by rapid fine-tuning on company-specific data for targeted applications. By integrating our proposed designs, VeritasFi presents a groundbreaking framework that greatly enhances the adaptability and robustness of financial RAG systems, providing a scalable solution for both general-domain and company-specific QA tasks. Code accompanying this work is available at https://github.com/simplew4y/VeritasFi.git.




Explainable Fraud Detection with GNNExplainer and Shapley Values

Dao, Ngoc Hieu

arXiv.org Artificial Intelligence

The risk of financial fraud is increasing as digital payments are used more and more frequently. Although the use of artificial intelligence systems for fraud detection is widespread, society and regulators have raised the standards for these systems' transparency for reliability verification purposes. To increase their effectiveness in conducting fraud investigations, fraud analysts also profit from having concise and understandable explanations. To solve these challenges, the paper will concentrate on developing an explainable fraud detector.


A Graph Machine Learning Approach for Detecting Topological Patterns in Transactional Graphs

Zola, Francesco, Medina, Jon Ander, Venturi, Andrea, Gil, Amaia, Orduna, Raul

arXiv.org Artificial Intelligence

The rise of digital ecosystems has exposed the financial sector to evolving abuse and criminal tactics that share operational knowledge and techniques both within and across different environments (fiat-based, crypto-assets, etc.). Traditional rule-based systems lack the adaptability needed to detect sophisticated or coordinated criminal behaviors (patterns), highlighting the need for strategies that analyze actors' interactions to uncover suspicious activities and extract their modus operandi. For this reason, in this work, we propose an approach that integrates graph machine learning and network analysis to improve the detection of well-known topological patterns within transactional graphs. However, a key challenge lies in the limitations of traditional financial datasets, which often provide sparse, unlabeled information that is difficult to use for graph-based pattern analysis. Therefore, we firstly propose a four-step preprocessing framework that involves (i) extracting graph structures, (ii) considering data temporality to manage large node sets, (iii) detecting communities within, and (iv) applying automatic labeling strategies to generate weak ground-truth labels. Then, once the data is processed, Graph Autoencoders are implemented to distinguish among the well-known topological patterns. Specifically, three different GAE variants are implemented and compared in this analysis. Preliminary results show that this pattern-focused, topology-driven method is effective for detecting complex financial crime schemes, offering a promising alternative to conventional rule-based detection systems.


AMLNet: A Knowledge-Based Multi-Agent Framework to Generate and Detect Realistic Money Laundering Transactions

Huda, Sabin, Foo, Ernest, Jadidi, Zahra, Newton, MA Hakim, Sattar, Abdul

arXiv.org Artificial Intelligence

Anti-money laundering (AML) research is constrained by the lack of publicly shareable, regulation-aligned transaction datasets. We present AMLNet, a knowledge-based multi-agent framework with two coordinated units: a regulation-aware transaction generator and an ensemble detection pipeline. The generator produces 1,090,173 synthetic transactions (approximately 0.16\% laundering-positive) spanning core laundering phases (placement, layering, integration) and advanced typologies (e.g., structuring, adaptive threshold behavior). Regulatory alignment reaches 75\% based on AUSTRAC rule coverage (Section 4.2), while a composite technical fidelity score of 0.75 summarizes temporal, structural, and behavioral realism components (Section 4.4). The detection ensemble achieves F1 0.90 (precision 0.84, recall 0.97) on the internal test partitions of AMLNet and adapts to the external SynthAML dataset, indicating architectural generalizability across different synthetic generation paradigms. We provide multi-dimensional evaluation (regulatory, temporal, network, behavioral) and release the dataset (Version 1.0, https://doi.org/10.5281/zenodo.16736515), to advance reproducible and regulation-conscious AML experimentation.